[BYOC] JSON Runtime with DNNL End-to-End Flow #5919

comaniac · 2020-06-24T18:01:55Z

RFC discussion: https://discuss.tvm.ai/t/byoc-runtime-json-runtime-for-byoc/6579

Currently, BYOC allows developers to choose either C source module or their customized module as the runtime for their accelerators. While we have provided an end-to-end execution flow of DNNL (i.e., MKL-DNN, OneDNN) using C source module, we found that many developers prefer to use a customized module to better integrate to their own runtime engine, such as TensorRT. As a result, this PR (collaborating with @zhiics) provides an end-to-end flow of DNNL using JSON runtime. Some detail highlights:

We provide JSON codegen and JSON runtime base classes. JSON codegen serializes a Relay subgaph to a JSON file; while JSON runtime base provides deserialization methods to interprete subgraphs in JSON format. Developers can derive JSON codegen to easily customize their codegen, or even directly use JSON codegen if their runtime engine accepts standard TVM graph runtime JSON.
We make a case study of leveraging JSON runtime with DNNL. The DNNL JSON runtime now supports conv2d, dense, relu, batch_norm, and add. As a result, it is able to run MobileNet. Note that DNNL JSON runtime only creates one DNNL execution engine for a subgraph, so it is much more efficient compared to the C source module version, which creates a DNNL engine for each operator in a subgraph.
DNNL JSON runtime handles constant tensors following the new mechanism in [RUNTIME] Introduce MetadataModule to separate code compilation/interpretation and weight initialization #5770.
DNNL codegen with C source module will be preserved for illustraction purpose, and we use cmake to control which DNNL codegen should be used. Specifically, USE_DNNL_CODEGEN ON and USE_DNNL_CODEGEN JSON enable the JSON runtime (and this is the default runtime for DNNL). When following the tutorial, which we will update after this PR, users may use USE_DNNL_CODEGEN C_SRC to enable C source module so that they can learn how it work.

Evaluation

This PR doesn't push the inference performance with the DNNL codegen/runtime. While we leave the issues as the future work, here we list some performance issues we have observed.

write_to_dnnl_memory performs memory copy from DLTensor (NDArray) to DNNL memory. This can be avoided by specifying NDArray pointer when creating a DNNL memory. (~5 ms overhead to MobileNet V2).
We should set OMP_NUM_THREADS wisely. For example, MobileNet V2 with batch_norm simplified achieves 1400 ms on c5.2xlarge. However, if we run it with OMP_NUM_THREADS=16, then the latency dropped to 65 ms.

Baseline: 1400 ms.
With OMP_NUM_THREADS=16: 65 ms.

We should not assign the output layout to conv2d. The default output layout of DNNL conv2d is NHWC. If we assign the output layout to be NCHW, then it will perform layout transform after the computation. Instead, we should let DNNL use the layout it prefers, and only assign the layout to the last operator.

Baseline: 1400 ms.
Unassigned layout: 131 ms.
Unassigned layout with OMP_NUM_THREADS=16: 16 ms.

In summary, if we resolve all issue mentioned above, the inference performance of MoibleNet V2 on c5.2xlarge should be 16 - 5 =11 ms.

cc @masahi @mbaret @tqchen

masahi · 2020-06-25T00:40:00Z

maybe we should enable dnnl on CI?

zhiics · 2020-06-25T00:43:54Z

yeah, we should. And we should remove the json_runtime_example.

src/runtime/metadata_module.cc

src/relay/backend/contrib/codegen_json/codegen_json.h

tests/python/relay/test_json_runtime.py

src/runtime/contrib/json/json_runtime.h

src/relay/backend/graph_runtime_codegen.cc

tests/python/relay/test_json_runtime.py

src/runtime/contrib/json/json_node.h

src/runtime/contrib/dnnl/dnnl_json_runtime.cc

lhutton1 · 2020-06-29T17:50:33Z

Possibly out of scope for this PR but is there a plan to support multiple functions/sub-graphs? Currently it looks like there is only support for a single dnnl sub-graph after the graph is partitioned?

comaniac · 2020-06-29T17:54:10Z

Possibly out of scope for this PR but is there a plan to support multiple functions/sub-graphs? Currently it looks like there is only support for a single dnnl sub-graph after the graph is partitioned?

We now have only one subgraph per module, but we could have many modules to support multiple subgraphs. Please see @mbaret 's comments to this PR and the discussions for details.

lhutton1 · 2020-06-29T17:58:15Z

Possibly out of scope for this PR but is there a plan to support multiple functions/sub-graphs? Currently it looks like there is only support for a single dnnl sub-graph after the graph is partitioned?

We now have only one subgraph per module, but we could have many modules to support multiple subgraphs. Please see @mbaret 's comments to this PR and the discussions for details.

Apologies, missed that, thanks

src/relay/backend/contrib/codegen_json/codegen_json.h

src/relay/backend/contrib/dnnl/codegen.cc

src/runtime/contrib/dnnl/dnnl_json_runtime.cc

src/relay/backend/contrib/codegen_json/codegen_json.h

mbaret

Looks good to me, but please wait on @lhutton1's approval to confirm this is usable with ACL.

lhutton1

It's working well so far, thanks! I think the api to add additional attributes and retrieve them from a json node is a bit convoluted but I this could always be improved at a later date.

masahi · 2020-07-01T20:54:49Z

Do we want to wait until dnnl is up on the CI? And what about @zhiics's comment below.

And we should remove the json_runtime_example

zhiics · 2020-07-01T20:56:45Z

@masahi Thanks, my comment should be resolved with a followup PR.

comaniac · 2020-07-01T21:03:19Z

We could wait for CI. It should have been updated and included DNNL library already (#5936 ).

comaniac · 2020-07-02T20:16:11Z

Wait for #5985.

masahi · 2020-07-10T11:39:09Z

Thanks @comaniac @zhiics @mbaret @lhutton1

* json runtime * json dnnl WIP * fix ArrayNode usages * Support composite functions * DNNL json runtime: conv2d/add/relu/dense/bn * add a more complex example * fix bias memory issue * rebase to upstream * merge to metadata module, remove the unused driver * handle constant * support composite functions * support DNNL constant * clean up * Simplify dnnl user code * GetDataSize * fix dense bug * improve cmake * zero copy * add unit test * move json to contrib/json * fix cmake * lint * max_digits10 for fp serialization * only keep base getfunction * fix lint * zero copy for all data entries * address comments * enable ci * address comment; fix bug * address comment Co-authored-by: Zhi Chen <chzhi@amazon.com>

tqchen added the status: need review label Jun 24, 2020

comaniac force-pushed the json_runtime branch 2 times, most recently from 7178c5e to 9f76449 Compare June 25, 2020 04:03

zhiics force-pushed the json_runtime branch from 9f76449 to db5db74 Compare June 25, 2020 04:04

mbaret requested changes Jun 25, 2020

View reviewed changes

comaniac force-pushed the json_runtime branch from 0649788 to 78fc082 Compare June 25, 2020 15:52

lhutton1 reviewed Jun 26, 2020

View reviewed changes

tests/python/relay/test_json_runtime.py Outdated Show resolved Hide resolved

src/runtime/contrib/json/json_node.h Show resolved Hide resolved

comaniac mentioned this pull request Jun 26, 2020

[CI] Install DNNL (OneDNN) to CI Environment #5936

Merged

masahi reviewed Jun 27, 2020

View reviewed changes

src/runtime/contrib/dnnl/dnnl_json_runtime.cc Outdated Show resolved Hide resolved

comaniac force-pushed the json_runtime branch from df873e7 to f5487b4 Compare June 29, 2020 20:20